[rocky8_10] Hisotry rebuild to kernel-4.18.0-553.47.1.el8_10 #203

PlaidCat · 2025-04-08T22:40:05Z

T
General Process:

Download all unprocessed src.rpm
for each src,pm
- Find all commits in changelog up to last known tag ... in this case 4.18.0-553
- Re-play commits in revese order (oldest in change log to newest) with git cherry-pick
- After replay replace ENTIRE code in branch with rpmbuild -bp from corresponding src.rpm.
- Tag Rebuild branch
Do local build with https://github.com/ctrliq/kernel-src-tree/wiki/Kernel-Make,-KABI,-Install,-and-Reboot-script

Contains the following: http://download.rockylinux.org/pub/rocky/8.10/BaseOS/source/tree/Packages/k/

tag: resf_kernel-4.18.0-553.42.1.el8_10 kernel-4.18.0-553.42.1.el8_10.src.rpm
tag: resf_kernel-4.18.0-553.44.1.el8_10 kernel-4.18.0-553.44.1.el8_10.src.rpm
tag: resf_kernel-4.18.0-553.45.1.el8_10 kernel-4.18.0-553.45.1.el8_10.src.rpm
tag: resf_kernel-4.18.0-553.46.1.el8_10 kernel-4.18.0-553.46.1.el8_10.src.rpm
tag: resf_kernel-4.18.0-553.47.1.el8_10 kernel-4.18.0-553.47.1.el8_10.src.rpm

Checking Rebuild Commits for potentially missing commits:

The only one that stood out was this one:
net: skb: exclude the single page frag cache for too small alloc but a search does not turn up anything and the 87.5% fuzzy string matching should have found it if it exists upstream. It will be included in the splat though and the important thing is there does not appear to be any other non-upstream commits from the Red Hat upstream.
https://github.com/search?q=repo%3Atorvalds%2Flinux+%22net%3A+skb%3A+exclude+the+single+page+frag+cache+for+too+small+alloc%22&type=commits

$ ls ciq/ciq_backports/kernel-4.18.0-553.4*/rebuild.details.txt | while read line; do echo $line; cat $line; echo ""; echo ""; done
ciq/ciq_backports/kernel-4.18.0-553.40.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 524209
Number of commits in rpm: 17
Number of commits matched with upstream: 11 (64.71%)
Number of commits in upstream but not in rpm: 524198
Number of commits NOT found in upstream: 6 (35.29%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.40.1.el8_10 for kernel-4.18.0-553.40.1.el8_10
Clean Cherry Picks: 5 (45.45%)
Empty Cherry Picks: 6 (54.55%)
_______________________________

__EMPTY COMMITS__________________________
0467cdde8c4320bbfdb31a8cff1277b202f677fc s390/pci: Sort PCI functions prior to creating virtual busses
126034faaac5f356822c4a9bebfa75664da11056 s390/pci: Use topology ID for multi-function devices
25f39d3dcb48bbc824a77d16b3d977f0f3713cfe s390/pci: Ignore RID for isolated VFs
48796104c864cf4dafa80bd8c2ce88f9c92a65ea s390/pci: Fix leak of struct zpci_dev when zpci_add_device() fails
5fd11b96b43708f2f6e3964412c301c1bd20ec0f s390/pci: Refactor arch_setup_msi_irqs()
ab42fcb511fd9d241bbab7cc3ca04e34e9fc0666 s390/pci: Allow allocation of more than 1 MSI interrupt

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values


ciq/ciq_backports/kernel-4.18.0-553.42.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 14
Number of commits matched with upstream: 7 (50.00%)
Number of commits in upstream but not in rpm: 538200
Number of commits NOT found in upstream: 7 (50.00%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.42.1.el8_10 for kernel-4.18.0-553.42.1.el8_10
Clean Cherry Picks: 6 (85.71%)
Empty Cherry Picks: 1 (14.29%)
_______________________________

__EMPTY COMMITS__________________________
98b37881b7492ae9048ad48260cc8a6ee9eb39fd scsi: st: Don't set pos_unknown just after device recognition

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values
net: skb: exclude the single page frag cache for too small alloc


ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 24
Number of commits matched with upstream: 18 (75.00%)
Number of commits in upstream but not in rpm: 538189
Number of commits NOT found in upstream: 6 (25.00%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.44.1.el8_10 for kernel-4.18.0-553.44.1.el8_10
Clean Cherry Picks: 14 (77.78%)
Empty Cherry Picks: 4 (22.22%)
_______________________________

__EMPTY COMMITS__________________________
72ed5d5624af384eaf74d84915810d54486a75e2 net/mlx5: Suspend auxiliary devices only in case of PCI device suspend
aab8e1a200b926147db51e3f82fd07bb9edf6a98 net/mlx5: Reload auxiliary devices in pci error handlers
c79a39dc8d060b9e64e8b0fa9d245d44befeefbe pps: Fix a use-after-free
415d832497098030241605c52ea83d4e2cfa7879 locking/atomic: Make test_and_*_bit() ordered on failure

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values


ciq/ciq_backports/kernel-4.18.0-553.45.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 19
Number of commits matched with upstream: 13 (68.42%)
Number of commits in upstream but not in rpm: 538194
Number of commits NOT found in upstream: 6 (31.58%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.45.1.el8_10 for kernel-4.18.0-553.45.1.el8_10
Clean Cherry Picks: 11 (84.62%)
Empty Cherry Picks: 2 (15.38%)
_______________________________

__EMPTY COMMITS__________________________
6cf9ff463317217d95732a6cce6fbdd12508921a net: smc: fix spurious error message from __sock_release()
ba0925c34e0fa6fe02d3d642bc02ab099ab312c7 gve: process XSK TX descriptors as part of RX NAPI

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values


ciq/ciq_backports/kernel-4.18.0-553.46.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 12
Number of commits matched with upstream: 6 (50.00%)
Number of commits in upstream but not in rpm: 538201
Number of commits NOT found in upstream: 6 (50.00%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.46.1.el8_10 for kernel-4.18.0-553.46.1.el8_10
Clean Cherry Picks: 6 (100.00%)
Empty Cherry Picks: 0 (0.00%)
_______________________________

__EMPTY COMMITS__________________________

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values


ciq/ciq_backports/kernel-4.18.0-553.47.1.el8_10/rebuild.details.txt
Rebuild_History BUILDABLE
Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50%
Number of commits in upstream range v4.18~1..master: 538207
Number of commits in rpm: 9
Number of commits matched with upstream: 3 (33.33%)
Number of commits in upstream but not in rpm: 538204
Number of commits NOT found in upstream: 6 (66.67%)

Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.47.1.el8_10 for kernel-4.18.0-553.47.1.el8_10
Clean Cherry Picks: 1 (33.33%)
Empty Cherry Picks: 2 (66.67%)
_______________________________

__EMPTY COMMITS__________________________
8b62645b09f870d70c7910e7550289d444239a46 bpf: Use raw_spinlock_t in ringbuf
f32a213765739f2a1db319346799f130a3d08820 ethtool: runtime-resume netdev parent before ethtool ioctl ops

__CHANGES NOT IN UPSTREAM________________
Adding prod certs and changed cert date to 20210620
Adding Rocky secure boot certs
Fixing vmlinuz removal
Fixing UEFI CA path
Porting to 8.10, debranding and Rocky branding
Fixing pesign_key_name values

BUILD

/mnt/code/kernel-src-tree
no .config file found, moving on
[TIMER]{MRPROPER}: 0s
x86_64 architecture detected, copying config
'configs/kernel-4.18.0-x86_64.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-rocky8_10_rebuild-01aef32f4a9b"
Making olddefconfig
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf  --olddefconfig Kconfig
#
# configuration written to .config
#
Starting Build
scripts/kconfig/conf  --syncconfig Kconfig
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h

[SNIP]

  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
[TIMER]{BUILD}: 2123s
Making Modules
  INSTALL arch/x86/crypto/blowfish-x86_64.ko
  INSTALL arch/x86/crypto/camellia-aesni-avx-x86_64.ko

[SNIP]

  INSTALL virt/lib/irqbypass.ko
  DEPMOD  4.18.0-rocky8_10_rebuild-01aef32f4a9b+
[TIMER]{MODULES}: 16s
Making Install
sh ./arch/x86/boot/install.sh 4.18.0-rocky8_10_rebuild-01aef32f4a9b+ arch/x86/boot/bzImage \
        System.map "/boot"
[TIMER]{INSTALL}: 22s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-4.18.0-rocky8_10_rebuild-01aef32f4a9b+ and Index to 0
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 0s
[TIMER]{BUILD}: 2123s
[TIMER]{MODULES}: 16s
[TIMER]{INSTALL}: 22s
[TIMER]{TOTAL} 2167s
Rebooting in 10 seconds

Boot

[maple@r8-sigcloud-builder code]$ uname -r
4.18.0-rocky8_10_rebuild-01aef32f4a9b+

Kselftest crash check

Just checking for crashes, since the last commits is a splat of the exploded Rocky Kernel.

[maple@r8-sigcloud-builder code]$ grep '^ok ' 4.18.0-rocky8_10_rebuild-01aef32f4a9b+.kself.log | wc -l
206

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10 commit-author Mikulas Patocka <[email protected]> commit 6e7132e There was reported lockup when we exit a snapshot with many exceptions. Fix this by adding "cond_resched" to the loop that frees the exceptions. Reported-by: John Pittman <[email protected]> Cc: [email protected] Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> (cherry picked from commit 6e7132e) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10 commit-author Jason Wang <[email protected]> commit d71ebe8 Commit a7766ef("virtio_net: disable cb aggressively") enables virtqueue callback via the following statement: do { if (use_napi) virtqueue_disable_cb(sq->vq); free_old_xmit_skbs(sq, false); } while (use_napi && kick && unlikely(!virtqueue_enable_cb_delayed(sq->vq))); When NAPI is used and kick is false, the callback won't be enabled here. And when the virtqueue is about to be full, the tx will be disabled, but we still don't enable tx interrupt which will cause a TX hang. This could be observed when using pktgen with burst enabled. TO be consistent with the logic that tries to disable cb only for NAPI, fixing this by trying to enable delayed callback only when NAPI is enabled when the queue is about to be full. Fixes: a7766ef ("virtio_net: disable cb aggressively") Signed-off-by: Jason Wang <[email protected]> Tested-by: Laurent Vivier <[email protected]> Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit d71ebe8) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10 commit-author Anumula Murali Mohan Reddy <[email protected]> commit c659b40 ip_dev_find() always returns real net_device address, whether traffic is running on a vlan or real device, if traffic is over vlan, filling endpoint struture with real ndev and an attempt to send a connect request will results in RDMA_CM_EVENT_UNREACHABLE error. This patch fixes the issue by using vlan_dev_real_dev(). Fixes: 830662f ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address") Link: https://patch.msgid.link/r/[email protected] Signed-off-by: Anumula Murali Mohan Reddy <[email protected]> Signed-off-by: Potnuri Bharat Teja <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit c659b40) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10 commit-author Chen Zhongjin <[email protected]> commit 672e426 ovl_dentry_revalidate_common() can be called in rcu-walk mode. As document said, "in rcu-walk mode, d_parent and d_inode should not be used without care". Check inode here to protect access under rcu-walk mode. Fixes: bccece1 ("ovl: allow remote upper") Reported-and-tested-by: [email protected] Signed-off-by: Chen Zhongjin <[email protected]> Cc: <[email protected]> # v5.7 Signed-off-by: Miklos Szeredi <[email protected]> (cherry picked from commit 672e426) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10 commit-author Kai Mäkisara <[email protected]> commit 98b3788 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.42.1.el8_10/98b37881.failed Commit 9604eea ("scsi: st: Add third party poweron reset handling") in v6.6 added new code to handle the Power On/Reset Unit Attention (POR UA) sense data. This was in addition to the existing method. When this Unit Attention is received, the driver blocks attempts to read, write and some other operations because the reset may have rewinded the tape. Because of the added code, also the initial POR UA resulted in blocking operations, including those that are used to set the driver options after the device is recognized. Also, reading and writing are refused, whereas they succeeded before this commit. Add code to not set pos_unknown to block operations if the POR UA is received from the first test_ready() call after the st device has been created. This restores the behavior before v6.6. Signed-off-by: Kai Mäkisara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Fixes: 9604eea ("scsi: st: Add third party poweron reset handling") CC: [email protected] Closes: https://lore.kernel.org/linux-scsi/[email protected]/ Signed-off-by: Martin K. Petersen <[email protected]> (cherry picked from commit 98b3788) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/scsi/st.c

…le_direct_reclaim() jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10 commit-author Seiji Nishikawa <[email protected]> commit 6aaced5 The task sometimes continues looping in throttle_direct_reclaim() because allow_direct_reclaim(pgdat) keeps returning false. #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c #2 [ffff80002cb6f990] schedule at ffff800008abc50c #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550 #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68 #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660 #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98 #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8 #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974 #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4 At this point, the pgdat contains the following two zones: NODE: 4 ZONE: 0 ADDR: ffff00817fffe540 NAME: "DMA32" SIZE: 20480 MIN/LOW/HIGH: 11/28/45 VM_STAT: NR_FREE_PAGES: 359 NR_ZONE_INACTIVE_ANON: 18813 NR_ZONE_ACTIVE_ANON: 0 NR_ZONE_INACTIVE_FILE: 50 NR_ZONE_ACTIVE_FILE: 0 NR_ZONE_UNEVICTABLE: 0 NR_ZONE_WRITE_PENDING: 0 NR_MLOCK: 0 NR_BOUNCE: 0 NR_ZSPAGES: 0 NR_FREE_CMA_PAGES: 0 NODE: 4 ZONE: 1 ADDR: ffff00817fffec00 NAME: "Normal" SIZE: 8454144 PRESENT: 98304 MIN/LOW/HIGH: 68/166/264 VM_STAT: NR_FREE_PAGES: 146 NR_ZONE_INACTIVE_ANON: 94668 NR_ZONE_ACTIVE_ANON: 3 NR_ZONE_INACTIVE_FILE: 735 NR_ZONE_ACTIVE_FILE: 78 NR_ZONE_UNEVICTABLE: 0 NR_ZONE_WRITE_PENDING: 0 NR_MLOCK: 0 NR_BOUNCE: 0 NR_ZSPAGES: 0 NR_FREE_CMA_PAGES: 0 In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of inactive/active file-backed pages calculated in zone_reclaimable_pages() based on the result of zone_page_state_snapshot() is zero. Additionally, since this system lacks swap, the calculation of inactive/ active anonymous pages is skipped. crash> p nr_swap_pages nr_swap_pages = $1937 = { counter = 0 } As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having free pages significantly exceeding the high watermark. The problem is that the pgdat->kswapd_failures hasn't been incremented. crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures $1935 = 0x0 This is because the node deemed balanced. The node balancing logic in balance_pgdat() evaluates all zones collectively. If one or more zones (e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the entire node is deemed balanced. This causes balance_pgdat() to exit early before incrementing the kswapd_failures, as it considers the overall memory state acceptable, even though some zones (like ZONE_NORMAL) remain under significant pressure. The patch ensures that zone_reclaimable_pages() includes free pages (NR_FREE_PAGES) in its calculation when no other reclaimable pages are available (e.g., file-backed or anonymous pages). This change prevents zones like ZONE_DMA32, which have sufficient free pages, from being mistakenly deemed unreclaimable. By doing so, the patch ensures proper node balancing, avoids masking pressure on other zones like ZONE_NORMAL, and prevents infinite loops in throttle_direct_reclaim() caused by allow_direct_reclaim(pgdat) repeatedly returning false. The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused by a node being incorrectly deemed balanced despite pressure in certain zones, such as ZONE_NORMAL. This issue arises from zone_reclaimable_pages() returning 0 for zones without reclaimable file- backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient free pages to be skipped. The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored during reclaim, masking pressure in other zones. Consequently, pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback mechanisms in allow_direct_reclaim() from being triggered, leading to an infinite loop in throttle_direct_reclaim(). This patch modifies zone_reclaimable_pages() to account for free pages (NR_FREE_PAGES) when no other reclaimable pages exist. This ensures zones with sufficient free pages are not skipped, enabling proper balancing and reclaim behavior. [[email protected]: coding-style cleanups] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 5a1c84b ("mm: remove reclaim and compaction retry approximations") Signed-off-by: Seiji Nishikawa <[email protected]> Cc: Mel Gorman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> (cherry picked from commit 6aaced5) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.42.1.el8_10 commit-author Chuck Lever <[email protected]> commit 961b4b5 I noticed that once an NFSv4.1 callback operation gets a NFS4ERR_DELAY status on CB_SEQUENCE and then the connection is lost, the callback client loops, resending it indefinitely. The switch arm in nfsd4_cb_sequence_done() that handles NFS4ERR_DELAY uses rpc_restart_call() to rearm the RPC state machine for the retransmit, but that path does not call the rpc_prepare_call callback again. Thus cb_seq_status is set to -10008 by the first NFS4ERR_DELAY result, but is never set back to 1 for the retransmits. nfsd4_cb_sequence_done() thinks it's getting nothing but a long series of CB_SEQUENCE NFS4ERR_DELAY replies. Fixes: 7ba6cad ("nfsd: New helper nfsd4_cb_sequence_done() for processing more cb errors") Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Benjamin Coddington <[email protected]> Signed-off-by: Chuck Lever <[email protected]> (cherry picked from commit 961b4b5) Signed-off-by: Jonathan Maple <[email protected]>

Rebuild_History BUILDABLE Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50% Number of commits in upstream range v4.18~1..master: 538207 Number of commits in rpm: 14 Number of commits matched with upstream: 7 (50.00%) Number of commits in upstream but not in rpm: 538200 Number of commits NOT found in upstream: 7 (50.00%) Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.42.1.el8_10 for kernel-4.18.0-553.42.1.el8_10 Clean Cherry Picks: 6 (85.71%) Empty Cherry Picks: 1 (14.29%) _______________________________ Full Details Located here: ciq/ciq_backports/kernel-4.18.0-553.42.1.el8_10/rebuild.details.txt Includes: * git commit header above * Empty Commits with upstream SHA * RPM ChangeLog Entries that could not be matched Individual Empty Commit failures contained in the same containing directory. The git message for empty commits will have the path for the failed commit. File names are the first 8 characters of the upstream SHA

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Jiri Pirko <[email protected]> commit 72ed5d5 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/72ed5d56.failed The original behavior introduced by commit c6acd62 ("net/mlx5e: Add support for devlink-port in non-representors mode") correctly re-instantiated uplink devlink port and related netdevice during devlink reload. However with migration to auxiliary devices, this behaviour changed. Restore the original behaviour and tear down auxiliary devices completely during devlink reload. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> (cherry picked from commit 72ed5d5) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c # drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Moshe Shemesh <[email protected]> commit aab8e1a Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/aab8e1a2.failed Handling pci errors should fully teardown and load back auxiliary devices, same as done through mlx5 health recovery flow. Fixes: 72ed5d5 ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") Signed-off-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> (cherry picked from commit aab8e1a) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/mellanox/mlx5/core/main.c

jira LE-2741 cve CVE-2024-57807 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Tomas Henzl <[email protected]> commit 50740f4 This fixes a 'possible circular locking dependency detected' warning CPU0 CPU1 ---- ---- lock(&instance->reset_mutex); lock(&shost->scan_mutex); lock(&instance->reset_mutex); lock(&shost->scan_mutex); Fix this by temporarily releasing the reset_mutex. Signed-off-by: Tomas Henzl <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Chandrakanth Patil <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> (cherry picked from commit 50740f4) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Nina Schoetterl-Glausch <[email protected]> commit 22fdd8b In order for SIE to interpretively execute STFLE, it requires the real or absolute address of a facility-list control block. Before writing the location into the shadow SIE control block, convert it from a virtual address. We currently do not run into this bug because the lower 31 bits are the same for virtual and physical addresses. Signed-off-by: Nina Schoetterl-Glausch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Janosch Frank <[email protected]> Message-Id: <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]> (cherry picked from commit 22fdd8b) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Nina Schoetterl-Glausch <[email protected]> commit cc4edb9 The address of the crypto control block in the (shadow) SIE block is absolute/physical. Convert from virtual to physical when shadowing the guest's control block during VSIE. Signed-off-by: Nina Schoetterl-Glausch <[email protected]> Reviewed-by: Christian Borntraeger <[email protected]> Acked-by: Alexander Gordeev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]> (cherry picked from commit cc4edb9) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Claudio Imbrenda <[email protected]> commit cff59d8 The return value uv_set_shared() and uv_remove_shared() (which are wrappers around the share() function) is not always checked. The system integrity of a protected guest depends on the Share and Unshare UVCs being successful. This means that any caller that fails to check the return value will compromise the security of the protected guest. No code path that would lead to such violation of the security guarantees is currently exercised, since all the areas that are shared never get unshared during the lifetime of the system. This might change and become an issue in the future. The Share and Unshare UVCs can only fail in case of hypervisor misbehaviour (either a bug or malicious behaviour). In such cases there is no reasonable way forward, and the system needs to panic. This patch replaces the return at the end of the share() function with a panic, to guarantee system integrity. Fixes: 5abb935 ("s390/uv: introduce guest side ultravisor code") Signed-off-by: Claudio Imbrenda <[email protected]> Reviewed-by: Christian Borntraeger <[email protected]> Reviewed-by: Steffen Eiden <[email protected]> Reviewed-by: Janosch Frank <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-ID: <[email protected]> [[email protected]: Fixed up patch subject] Signed-off-by: Janosch Frank <[email protected]> (cherry picked from commit cff59d8) Signed-off-by: Jonathan Maple <[email protected]>

…query jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Hariharan Mari <[email protected]> commit 09c38ad The __insn32_query() function incorrectly uses the RRF instruction format for both the SORTL (RRE format) and DFLTCC (RRF format) instructions. To fix this issue, add separate query functions for SORTL and DFLTCC that use the appropriate instruction formats. Additionally pass the query operand as a pointer to the entire array of 32 elements to slightly optimize performance and readability. Fixes: d668139 ("KVM: s390: provide query function for instructions returning 32 byte") Suggested-by: Heiko Carstens <[email protected]> Reviewed-by: Juergen Christ <[email protected]> Signed-off-by: Hariharan Mari <[email protected]> Signed-off-by: Janosch Frank <[email protected]> (cherry picked from commit 09c38ad) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Nico Boehr <[email protected]> commit e8061f0 Previously, access_guest_page() did not check whether the given guest address is inside of a memslot. This is not a problem, since kvm_write_guest_page/kvm_read_guest_page return -EFAULT in this case. However, -EFAULT is also returned when copy_to/from_user fails. When emulating a guest instruction, the address being outside a memslot usually means that an addressing exception should be injected into the guest. Failure in copy_to/from_user however indicates that something is wrong in userspace and hence should be handled there. To be able to distinguish these two cases, return PGM_ADDRESSING in access_guest_page() when the guest address is outside guest memory. In access_guest_real(), populate vcpu->arch.pgm.code such that kvm_s390_inject_prog_cond() can be used in the caller for injecting into the guest (if applicable). Since this adds a new return value to access_guest_page(), we need to make sure that other callers are not confused by the new positive return value. There are the following users of access_guest_page(): - access_guest_with_key() does the checking itself (in guest_range_to_gpas()), so this case should never happen. Even if, the handling is set up properly. - access_guest_real() just passes the return code to its callers, which are: - read_guest_real() - see below - write_guest_real() - see below There are the following users of read_guest_real(): - ar_translation() in gaccess.c which already returns PGM_* - setup_apcb10(), setup_apcb00(), setup_apcb11() in vsie.c which always return -EFAULT on read_guest_read() nonzero return - no change - shadow_crycb(), handle_stfle() always present this as validity, this could be handled better but doesn't change current behaviour - no change There are the following users of write_guest_real(): - kvm_s390_store_status_unloaded() always returns -EFAULT on write_guest_real() failure. Fixes: 2293897 ("KVM: s390: add architecture compliant guest access functions") Cc: [email protected] Signed-off-by: Nico Boehr <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Janosch Frank <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit e8061f0) Signed-off-by: Jonathan Maple <[email protected]>

…ndler jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Michael Mueller <[email protected]> commit cad4b3d The parameters for the diag 0x258 are real addresses, not virtual, but KVM was using them as virtual addresses. This only happened to work, since the Linux kernel as a guest used to have a 1:1 mapping for physical vs virtual addresses. Fix KVM so that it correctly uses the addresses as real addresses. Cc: [email protected] Fixes: 8ae04b8 ("KVM: s390: Guest's memory access functions get access registers") Suggested-by: Vasily Gorbik <[email protected]> Signed-off-by: Michael Mueller <[email protected]> Signed-off-by: Nico Boehr <[email protected]> Reviewed-by: Christian Borntraeger <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Janosch Frank <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit cad4b3d) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Calvin Owens <[email protected]> commit c79a39d Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/c79a39dc.failed On a board running ntpd and gpsd, I'm seeing a consistent use-after-free in sys_exit() from gpsd when rebooting: pps pps1: removed ------------[ cut here ]------------ kobject: '(null)' (00000000db4bec24): is not initialized, yet kobject_put() is being called. WARNING: CPU: 2 PID: 440 at lib/kobject.c:734 kobject_put+0x120/0x150 CPU: 2 UID: 299 PID: 440 Comm: gpsd Not tainted 6.11.0-rc6-00308-gb31c44928842 #1 Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kobject_put+0x120/0x150 lr : kobject_put+0x120/0x150 sp : ffffffc0803d3ae0 x29: ffffffc0803d3ae0 x28: ffffff8042dc9738 x27: 0000000000000001 x26: 0000000000000000 x25: ffffff8042dc9040 x24: ffffff8042dc9440 x23: ffffff80402a4620 x22: ffffff8042ef4bd0 x21: ffffff80405cb600 x20: 000000000008001b x19: ffffff8040b3b6e0 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 696e6920746f6e20 x14: 7369203a29343263 x13: 205d303434542020 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: kobject_put+0x120/0x150 cdev_put+0x20/0x3c __fput+0x2c4/0x2d8 ____fput+0x1c/0x38 task_work_run+0x70/0xfc do_exit+0x2a0/0x924 do_group_exit+0x34/0x90 get_signal+0x7fc/0x8c0 do_signal+0x128/0x13b4 do_notify_resume+0xdc/0x160 el0_svc+0xd4/0xf8 el0t_64_sync_handler+0x140/0x14c el0t_64_sync+0x190/0x194 ---[ end trace 0000000000000000 ]--- ...followed by more symptoms of corruption, with similar stacks: refcount_t: underflow; use-after-free. kernel BUG at lib/list_debug.c:62! Kernel panic - not syncing: Oops - BUG: Fatal exception This happens because pps_device_destruct() frees the pps_device with the embedded cdev immediately after calling cdev_del(), but, as the comment above cdev_del() notes, fops for previously opened cdevs are still callable even after cdev_del() returns. I think this bug has always been there: I can't explain why it suddenly started happening every time I reboot this particular board. In commit d953e0e ("pps: Fix a use-after free bug when unregistering a source."), George Spelvin suggested removing the embedded cdev. That seems like the simplest way to fix this, so I've implemented his suggestion, using __register_chrdev() with pps_idr becoming the source of truth for which minor corresponds to which device. But now that pps_idr defines userspace visibility instead of cdev_add(), we need to be sure the pps->dev refcount can't reach zero while userspace can still find it again. So, the idr_remove() call moves to pps_unregister_cdev(), and pps_idr now holds a reference to pps->dev. pps_core: source serial1 got cdev (251:1) <...> pps pps1: removed pps_core: unregistering pps1 pps_core: deallocating pps1 Fixes: d953e0e ("pps: Fix a use-after free bug when unregistering a source.") Cc: [email protected] Signed-off-by: Calvin Owens <[email protected]> Reviewed-by: Michal Schmidt <[email protected]> Link: https://lore.kernel.org/r/a17975fd5ae99385791929e563f72564edbcf28f.1731383727.git.calvin@wbinvd.org Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit c79a39d) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/pps/clients/pps-gpio.c # drivers/pps/clients/pps-ldisc.c # drivers/pps/pps.c # drivers/ptp/ptp_ocp.c

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Hector Martin <[email protected]> commit 415d832 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/415d8324.failed These operations are documented as always ordered in include/asm-generic/bitops/instrumented-atomic.h, and producer-consumer type use cases where one side needs to ensure a flag is left pending after some shared data was updated rely on this ordering, even in the failure case. This is the case with the workqueue code, which currently suffers from a reproducible ordering violation on Apple M1 platforms (which are notoriously out-of-order) that ends up causing the TTY layer to fail to deliver data to userspace properly under the right conditions. This change fixes that bug. Change the documentation to restrict the "no order on failure" story to the _lock() variant (for which it makes sense), and remove the early-exit from the generic implementation, which is what causes the missing barrier semantics in that case. Without this, the remaining atomic op is fully ordered (including on ARM64 LSE, as of recent versions of the architecture spec). Suggested-by: Linus Torvalds <[email protected]> Cc: [email protected] Fixes: e986a0d ("locking/atomics, asm-generic/bitops/atomic.h: Rewrite using atomic_*() APIs") Fixes: 61e0239 ("locking/atomic/bitops: Document and clarify ordering semantics for failed test_and_{}_bit()") Signed-off-by: Hector Martin <[email protected]> Acked-by: Will Deacon <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> (cherry picked from commit 415d832) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # include/asm-generic/bitops/atomic.h

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Peter Zijlstra <[email protected]> commit be24226 Because of late module patching, a livepatch module needs to be able to apply some of its relocations well after it has been loaded. Instead of playing games with module_{dis,en}able_ro(), use existing text poking mechanisms to apply relocations after module loading. So far only x86, s390 and Power have HAVE_LIVEPATCH but only the first two also have STRICT_MODULE_RWX. This will allow removal of the last module_disable_ro() usage in livepatch. The ultimate goal is to completely disallow making executable mappings writable. [ jpoimboe: Split up patches. Use mod state to determine whether memcpy() can be used. Test and add fixes. ] Cc: [email protected] Cc: Heiko Carstens <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Christian Borntraeger <[email protected]> Suggested-by: Josh Poimboeuf <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Joe Lawrence <[email protected]> Acked-by: Miroslav Benes <[email protected]> Acked-by: Gerald Schaefer <[email protected]> # s390 Signed-off-by: Jiri Kosina <[email protected]> (cherry picked from commit be24226) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Ilya Leoshkevich <[email protected]> commit f3b7e73 If the size of the PLT entries generated by apply_rela() exceeds 64KiB, the first ones can no longer reach __jump_r1 with brc. Fix by using brcl. An alternative solution is to add a __jump_r1 copy after every 64KiB, however, the space savings are quite small and do not justify the additional complexity. Fixes: f19fbd5 ("s390: introduce execute-trampolines for branches") Cc: [email protected] Reported-by: Andrea Righi <[email protected]> Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> (cherry picked from commit f3b7e73) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Manuel Barrio Linares <[email protected]> commit 44f69dd This adds support for all sample rates supported by the hardware,Digidesign Mbox 3 supports: {44100, 48000, 88200, 96000} Fixes syncing clock issues that presented as pops. To test this, without this patch playing 440hz tone produces pops. Clock is now synced between playback and capture interfaces so no more latency drift issue when using pipewire pro-profile. (https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/3900) Signed-off-by: Manuel Barrio Linares <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]> (cherry picked from commit 44f69dd) Signed-off-by: Jonathan Maple <[email protected]>

…box devices jira LE-2741 cve CVE-2024-53197 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Benoît Sevens <[email protected]> commit b909df1 A bogus device can provide a bNumConfigurations value that exceeds the initial value used in usb_get_configuration for allocating dev->config. This can lead to out-of-bounds accesses later, e.g. in usb_destroy_configuration. Signed-off-by: Benoît Sevens <[email protected]> Fixes: 1da177e ("Linux-2.6.12-rc2") Cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Takashi Iwai <[email protected]> (cherry picked from commit b909df1) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Manuel Barrio Linares <[email protected]> commit 5005ccd Fixed wrong use of usb_sndctrlpipe to usb_rcvctrlpipe Fixes: 44f69dd ("ALSA: usb-audio: Add sampling rates support for Mbox3") Signed-off-by: Manuel Barrio Linares <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]> (cherry picked from commit 5005ccd) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Dan Carpenter <[email protected]> commit f7d306b The usb_get_descriptor() function does DMA so we're not allowed to use a stack buffer for that. Doing DMA to the stack is not portable all architectures. Move the "new_device_descriptor" from being stored on the stack and allocate it with kmalloc() instead. Fixes: b909df1 ("ALSA: usb-audio: Fix potential out-of-bound accesses for Extigy and Mbox devices") Cc: [email protected] Signed-off-by: Dan Carpenter <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Takashi Iwai <[email protected]> (cherry picked from commit f7d306b) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 cve CVE-2024-50302 Rebuild_History Non-Buildable kernel-4.18.0-553.44.1.el8_10 commit-author Jiri Kosina <[email protected]> commit 177f25d Since the report buffer is used by all kinds of drivers in various ways, let's zero-initialize it during allocation to make sure that it can't be ever used to leak kernel memory via specially-crafted report. Fixes: 27ce405 ("HID: fix data access in implement()") Reported-by: Benoît Sevens <[email protected]> Acked-by: Benjamin Tissoires <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> (cherry picked from commit 177f25d) Signed-off-by: Jonathan Maple <[email protected]>

Rebuild_History BUILDABLE Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50% Number of commits in upstream range v4.18~1..master: 538207 Number of commits in rpm: 24 Number of commits matched with upstream: 18 (75.00%) Number of commits in upstream but not in rpm: 538189 Number of commits NOT found in upstream: 6 (25.00%) Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.44.1.el8_10 for kernel-4.18.0-553.44.1.el8_10 Clean Cherry Picks: 14 (77.78%) Empty Cherry Picks: 4 (22.22%) _______________________________ Full Details Located here: ciq/ciq_backports/kernel-4.18.0-553.44.1.el8_10/rebuild.details.txt Includes: * git commit header above * Empty Commits with upstream SHA * RPM ChangeLog Entries that could not be matched Individual Empty Commit failures contained in the same containing directory. The git message for empty commits will have the path for the failed commit. File names are the first 8 characters of the upstream SHA

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10 commit-author Dmitry Antipov <[email protected]> commit 6cf9ff4 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.45.1.el8_10/6cf9ff46.failed Commit 67f562e ("net/smc: transfer fasync_list in case of fallback") leaves the socket's fasync list pointer within a container socket as well. When the latter is destroyed, '__sock_release()' warns about its non-empty fasync list, which is a dangling pointer to previously freed fasync list of an underlying TCP socket. Fix this spurious warning by nullifying fasync list of a container socket. Fixes: 67f562e ("net/smc: transfer fasync_list in case of fallback") Signed-off-by: Dmitry Antipov <[email protected]> Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit 6cf9ff4) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # net/smc/af_smc.c

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10 commit-author Guangguan Wang <[email protected]> commit c12b270 AF_INET6 is not supported for smc-r v2 client before, even if the ipv6 addr is ipv4 mapped. Thus, when using AF_INET6, smc-r connection will fallback to tcp, especially for java applications running smc-r. This patch support ipv4 mapped ipv6 addr client for smc-r v2. Clients using real global ipv6 addr is still not supported yet. Signed-off-by: Guangguan Wang <[email protected]> Reviewed-by: Wen Gu <[email protected]> Reviewed-by: Dust Li <[email protected]> Reviewed-by: D. Wythe <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-by: Halil Pasic <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> (cherry picked from commit c12b270) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10 commit-author Andreas Gruenbacher <[email protected]> commit 5788253 Add a number of glock flags are currently not shown in the text form of glock tracepoints. Signed-off-by: Andreas Gruenbacher <[email protected]> (cherry picked from commit 5788253) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10 commit-author Bailey Forrest <[email protected]> commit 36e3b94 The NIC requires each TSO segment to not span more than 10 descriptors. NIC further requires each descriptor to not exceed 16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO). The descriptors for an skb are generated by gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format. gve_tx_add_skb_no_copy_dqo() loops through each skb frag and generates a descriptor for the entire frag if the frag size is not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s) of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO). gve_can_send_tso() checks if the descriptors thus generated for an skb would meet the requirement that each TSO-segment not span more than 10 descriptors. However, the current code misses an edge case when a TSO segment spans multiple descriptors within a large frag. This change fixes the edge case. gve_can_send_tso() relies on the assumption that max gso size (9728) is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb fragment a TSO segment can never span more than 2 descriptors. Fixes: a57e5de ("gve: DQO: Add TX path") Signed-off-by: Praveen Kaligineedi <[email protected]> Signed-off-by: Bailey Forrest <[email protected]> Reviewed-by: Jeroen de Borst <[email protected]> Cc: [email protected] Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 36e3b94) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10 commit-author Joshua Washington <[email protected]> commit ff7c2de In GVE, dedicated XDP queues only exist when an XDP program is installed and the interface is up. As such, the NDO XDP XMIT callback should return early if either of these conditions are false. In the case of no loaded XDP program, priv->num_xdp_queues=0 which can cause a divide-by-zero error, and in the case of interface down, num_xdp_queues remains untouched to persist XDP queue count for the next interface up, but the TX pointer itself would be NULL. The XDP xmit callback also needs to synchronize with a device transitioning from open to close. This synchronization will happen via the GVE_PRIV_FLAGS_NAPI_ENABLED bit along with a synchronize_net() call, which waits for any RCU critical sections at call-time to complete. Fixes: 39a7f4a ("gve: Add XDP REDIRECT support for GQI-QPL format") Cc: [email protected] Signed-off-by: Joshua Washington <[email protected]> Signed-off-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Shailend Chand <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit ff7c2de) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10 commit-author Joshua Washington <[email protected]> commit 40338d7 This patch predicates the enabling and disabling of XSK pools on the existence of queues. As it stands, if the interface is down, disabling or enabling XSK pools would result in a crash, as the RX queue pointer would be NULL. XSK pool registration will occur as part of the next interface up. Similarly, xsk_wakeup needs be guarded against queues disappearing while the function is executing, so a check against the GVE_PRIV_FLAGS_NAPI_ENABLED flag is added to synchronize with the disabling of the bit and the synchronize_net() in gve_turndown. Fixes: fd8e403 ("gve: Add AF_XDP zero-copy support for GQI-QPL format") Cc: [email protected] Signed-off-by: Joshua Washington <[email protected]> Signed-off-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Shailend Chand <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Larysa Zaremba <[email protected]> Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit 40338d7) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10 commit-author Joshua Washington <[email protected]> commit ba0925c Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.45.1.el8_10/ba0925c3.failed When busy polling is enabled, xsk_sendmsg for AF_XDP zero copy marks the NAPI ID corresponding to the memory pool allocated for the socket. In GVE, this NAPI ID will never correspond to a NAPI ID of one of the dedicated XDP TX queues registered with the umem because XDP TX is not set up to share a NAPI with a corresponding RX queue. This patch moves XSK TX descriptor processing from the TX NAPI to the RX NAPI, and the gve_xsk_wakeup callback is updated to use the RX NAPI instead of the TX NAPI, accordingly. The branch on if the wakeup is for TX is removed, as the NAPI poll should be invoked whether the wakeup is for TX or for RX. Fixes: fd8e403 ("gve: Add AF_XDP zero-copy support for GQI-QPL format") Cc: [email protected] Signed-off-by: Praveen Kaligineedi <[email protected]> Signed-off-by: Joshua Washington <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit ba0925c) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # drivers/net/ethernet/google/gve/gve.h

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.45.1.el8_10 commit-author Joshua Washington <[email protected]> commit fb3a9a1 Commit ba0925c ("gve: process XSK TX descriptors as part of RX NAPI") moved XSK TX processing to be part of the RX NAPI. However, that commit did not include triggering the RX NAPI in gve_xsk_wakeup. This is necessary because the TX NAPI only processes TX completions, meaning that a TX wakeup would not actually trigger XSK descriptor processing. Also, the branch on XDP_WAKEUP_TX was supposed to have been removed, as the NAPI should be scheduled whether the wakeup is for RX or TX. Fixes: ba0925c ("gve: process XSK TX descriptors as part of RX NAPI") Cc: [email protected] Signed-off-by: Joshua Washington <[email protected]> Signed-off-by: Praveen Kaligineedi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit fb3a9a1) Signed-off-by: Jonathan Maple <[email protected]>

Rebuild_History BUILDABLE Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50% Number of commits in upstream range v4.18~1..master: 538207 Number of commits in rpm: 19 Number of commits matched with upstream: 13 (68.42%) Number of commits in upstream but not in rpm: 538194 Number of commits NOT found in upstream: 6 (31.58%) Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.45.1.el8_10 for kernel-4.18.0-553.45.1.el8_10 Clean Cherry Picks: 11 (84.62%) Empty Cherry Picks: 2 (15.38%) _______________________________ Full Details Located here: ciq/ciq_backports/kernel-4.18.0-553.45.1.el8_10/rebuild.details.txt Includes: * git commit header above * Empty Commits with upstream SHA * RPM ChangeLog Entries that could not be matched Individual Empty Commit failures contained in the same containing directory. The git message for empty commits will have the path for the failed commit. File names are the first 8 characters of the upstream SHA

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10 commit-author Dave Airlie <[email protected]> commit 1f9910b The fence sync logic doesn't handle a fence sync across devices as it tries to write to a channel offset from one device into the fence bo from a different device, which won't work so well. This patch fixes that to avoid using the sync path in the case where the fences come from different nouveau drm devices. This works fine on a single device as the fence bo is shared across the devices, and mapped into each channels vma space, the channel offsets are therefore okay to pass between sides, so one channel can sync on the seqnos from the other by using the offset into it's vma. Signed-off-by: Dave Airlie <[email protected]> Cc: [email protected] Reviewed-by: Ben Skeggs <[email protected]> [ Fix compilation issue; remove version log from commit messsage. - Danilo ] Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 1f9910b) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 cve CVE-2025-21785 Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10 commit-author Radu Rendec <[email protected]> commit 875d742 The loop that detects/populates cache information already has a bounds check on the array size but does not account for cache levels with separate data/instructions cache. Fix this by incrementing the index for any populated leaf (instead of any populated level). Fixes: 5d425c1 ("arm64: kernel: add support for cpu cache information") Signed-off-by: Radu Rendec <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]> (cherry picked from commit 875d742) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10 commit-author Kirill A. Shutemov <[email protected]> commit 1b8b1aa Yingcong has noticed that on the 5-level paging machine, VDSO and VVAR VMAs are placed above the 47-bit border: 8000001a9000-8000001ad000 r--p 00000000 00:00 0 [vvar] 8000001ad000-8000001af000 r-xp 00000000 00:00 0 [vdso] This might confuse users who are not aware of 5-level paging and expect all userspace addresses to be under the 47-bit border. So far problem has only been triggered with ASLR disabled, although it may also occur with ASLR enabled if the layout is randomized in a just right way. The problem happens due to custom placement for the VMAs in the VDSO code: vdso_addr() tries to place them above the stack and checks the result against TASK_SIZE_MAX, which is wrong. TASK_SIZE_MAX is set to the 56-bit border on 5-level paging machines. Use DEFAULT_MAP_WINDOW instead. Fixes: b569bab ("x86/mm: Prepare to expose larger address space to userspace") Reported-by: Yingcong Wu <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/20230803151609.22141-1-kirill.shutemov%40linux.intel.com (cherry picked from commit 1b8b1aa) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit dc287e4 Since commit 25f39d3 ("s390/pci: Ignore RID for isolated VFs") PFs which are not initially configured but in standby are considered isolated. That is they create only a single function PCI domain. Due to the PCI domains being created on discovery, this means that even if they are configured later on, sibling PFs and their child VFs will not be added to their PCI domain breaking SR-IOV expectations. The reason the referenced commit ignored standby PFs for the creation of multi-function PCI subhierarchies, was to work around a PCI domain renumbering scenario on reboot. The renumbering would occur after removing a previously in standby PF, whose domain number is used for its configured sibling PFs and their child VFs, but which itself remained in standby. When this is followed by a reboot, the sibling PF is used instead to determine the PCI domain number of it and its child VFs. In principle it is not possible to know which standby PFs will be configured later and which may be removed. The PCI domain and root bus are pre-requisites for hotplug slots so the decision of which functions belong to which domain can not be postponed. With the renumbering occurring only in rare circumstances and being generally benign, accept it as an oddity and fix SR-IOV for initially standby PFs simply by allowing them to create PCI domains. Cc: [email protected] Reviewed-by: Gerd Bayer <[email protected]> Fixes: 25f39d3 ("s390/pci: Ignore RID for isolated VFs") Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]> (cherry picked from commit dc287e4) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit 0579388 This creates a new zpci_iov_find_parent_pf() function which a future commit can use to find if a VF has a configured parent PF. Use zdev->rid instead of zdev->devfn such that the new function can be used before it has been decided if the RID will be exposed and zdev->devfn is set. Also handle the hypotheical case that the RID is not available but there is an otherwise matching zbus. Fixes: 25f39d3 ("s390/pci: Ignore RID for isolated VFs") Cc: [email protected] Reviewed-by: Halil Pasic <[email protected]> Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> (cherry picked from commit 0579388) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.46.1.el8_10 commit-author Niklas Schnelle <[email protected]> commit 2844ddb In contrast to the commit message of the fixed commit VFs whose parent PF is not configured are not always isolated, that is put on their own PCI domain. This is because for VFs to be added to an existing PCI domain it is enough for that PCI domain to share the same topology ID or PCHID. Such a matching PCI domain without a parent PF may exist when a PF from the same PCI card created the domain with the VF being a child of a different, non accessible, PF. While not causing technical issues it makes the rules which VFs are isolated inconsistent. Fix this by explicitly checking that the parent PF exists on the PCI domain determined by the topology ID or PCHID before registering the VF. This works because a parent PF which is under control of this Linux instance must be enabled and configured at the point where its child VFs appear because otherwise SR-IOV could not have been enabled on the parent. Fixes: 25f39d3 ("s390/pci: Ignore RID for isolated VFs") Cc: [email protected] Reviewed-by: Halil Pasic <[email protected]> Signed-off-by: Niklas Schnelle <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> (cherry picked from commit 2844ddb) Signed-off-by: Jonathan Maple <[email protected]>

Rebuild_History BUILDABLE Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50% Number of commits in upstream range v4.18~1..master: 538207 Number of commits in rpm: 12 Number of commits matched with upstream: 6 (50.00%) Number of commits in upstream but not in rpm: 538201 Number of commits NOT found in upstream: 6 (50.00%) Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.46.1.el8_10 for kernel-4.18.0-553.46.1.el8_10 Clean Cherry Picks: 6 (100.00%) Empty Cherry Picks: 0 (0.00%) _______________________________ Full Details Located here: ciq/ciq_backports/kernel-4.18.0-553.46.1.el8_10/rebuild.details.txt Includes: * git commit header above * Empty Commits with upstream SHA * RPM ChangeLog Entries that could not be matched Individual Empty Commit failures contained in the same containing directory. The git message for empty commits will have the path for the failed commit. File names are the first 8 characters of the upstream SHA

jira LE-2741 cve CVE-2024-50138 Rebuild_History Non-Buildable kernel-4.18.0-553.47.1.el8_10 commit-author Wander Lairson Costa <[email protected]> commit 8b62645 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.47.1.el8_10/8b62645b.failed The function __bpf_ringbuf_reserve is invoked from a tracepoint, which disables preemption. Using spinlock_t in this context can lead to a "sleep in atomic" warning in the RT variant. This issue is illustrated in the example below: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 556208, name: test_progs preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 INFO: lockdep is turned off. Preemption disabled at: [<ffffd33a5c88ea44>] migrate_enable+0xc0/0x39c CPU: 7 PID: 556208 Comm: test_progs Tainted: G Hardware name: Qualcomm SA8775P Ride (DT) Call trace: dump_backtrace+0xac/0x130 show_stack+0x1c/0x30 dump_stack_lvl+0xac/0xe8 dump_stack+0x18/0x30 __might_resched+0x3bc/0x4fc rt_spin_lock+0x8c/0x1a4 __bpf_ringbuf_reserve+0xc4/0x254 bpf_ringbuf_reserve_dynptr+0x5c/0xdc bpf_prog_ac3d15160d62622a_test_read_write+0x104/0x238 trace_call_bpf+0x238/0x774 perf_call_bpf_enter.isra.0+0x104/0x194 perf_syscall_enter+0x2f8/0x510 trace_sys_enter+0x39c/0x564 syscall_trace_enter+0x220/0x3c0 do_el0_svc+0x138/0x1dc el0_svc+0x54/0x130 el0t_64_sync_handler+0x134/0x150 el0t_64_sync+0x17c/0x180 Switch the spinlock to raw_spinlock_t to avoid this error. Fixes: 457f443 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: Brian Grech <[email protected]> Signed-off-by: Wander Lairson Costa <[email protected]> Signed-off-by: Wander Lairson Costa <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 8b62645) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # kernel/bpf/ringbuf.c

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.47.1.el8_10 commit-author Heiner Kallweit <[email protected]> commit f32a213 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.47.1.el8_10/f32a2137.failed If a network device is runtime-suspended then: - network device may be flagged as detached and all ethtool ops (even if not accessing the device) will fail because netif_device_present() returns false - ethtool ops may fail because device is not accessible (e.g. because being in D3 in case of a PCI device) It may not be desirable that userspace can't use even simple ethtool ops that not access the device if interface or link is down. To be more friendly to userspace let's ensure that device is runtime-resumed when executing the respective ethtool op in kernel. Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]> (cherry picked from commit f32a213) Signed-off-by: Jonathan Maple <[email protected]> # Conflicts: # net/ethtool/ioctl.c

jira LE-2741 Rebuild_History Non-Buildable kernel-4.18.0-553.47.1.el8_10 commit-author Scott Mayhew <[email protected]> commit 0c8c7c5 This is a slight variation on a patch previously proposed by Neil Brown that never got merged. Prior to commit 5ceb9d7 ("NFS: Refactor nfs_lookup_revalidate()"), any error from nfs_lookup_verify_inode() other than -ESTALE would result in nfs_lookup_revalidate() returning that error (-ESTALE is mapped to zero). Since that commit, all errors result in nfs_lookup_revalidate() returning zero, resulting in dentries being invalidated where they previously were not (particularly in the case of -ERESTARTSYS). Fix it by passing the actual error code to nfs_lookup_revalidate_done(), and leaving the decision on whether to map the error code to zero or one to nfs_lookup_revalidate_done(). A simple reproducer is to run the following python code in a subdirectory of an NFS mount (not in the root of the NFS mount): ---8<--- import os import multiprocessing import time if __name__=="__main__": multiprocessing.set_start_method("spawn") count = 0 while True: try: os.getcwd() pool = multiprocessing.Pool(10) pool.close() pool.terminate() count += 1 except Exception as e: print(f"Failed after {count} iterations") print(e) break ---8<--- Prior to commit 5ceb9d7, the above code would run indefinitely. After commit 5ceb9d7, it fails almost immediately with -ENOENT. Signed-off-by: Scott Mayhew <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> (cherry picked from commit 0c8c7c5) Signed-off-by: Jonathan Maple <[email protected]>

Rebuild_History BUILDABLE Rebuilding Kernel from rpm changelog with Fuzz Limit: 87.50% Number of commits in upstream range v4.18~1..master: 538207 Number of commits in rpm: 9 Number of commits matched with upstream: 3 (33.33%) Number of commits in upstream but not in rpm: 538204 Number of commits NOT found in upstream: 6 (66.67%) Rebuilding Kernel on Branch rocky8_10_rebuild_kernel-4.18.0-553.47.1.el8_10 for kernel-4.18.0-553.47.1.el8_10 Clean Cherry Picks: 1 (33.33%) Empty Cherry Picks: 2 (66.67%) _______________________________ Full Details Located here: ciq/ciq_backports/kernel-4.18.0-553.47.1.el8_10/rebuild.details.txt Includes: * git commit header above * Empty Commits with upstream SHA * RPM ChangeLog Entries that could not be matched Individual Empty Commit failures contained in the same containing directory. The git message for empty commits will have the path for the failed commit. File names are the first 8 characters of the upstream SHA

thefossguy-ciq · 2025-04-09T11:12:33Z

@PlaidCat the missing commit (net: skb: exclude the single page frag cache for too small alloc) might be this revert: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=011b0335903832facca86cd8ed05d7d8d94c9c76

thefossguy-ciq

I didn't perform a 1-to-1 match of the commits to the SRPM changelog but with a relatively close look, nothing looks off. LGTM!

🚤

bmastbergen

🥌

PlaidCat · 2025-04-09T13:54:56Z

@PlaidCat the missing commit (net: skb: exclude the single page frag cache for too small alloc) might be this revert: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=011b0335903832facca86cd8ed05d7d8d94c9c76

Possibly ... like i've said before this process is imperfect thats why the only buildable commits are the ones with ebuild rocky8_10 with kernel-<src.rpm> as they are the replacement of the entire directory with the rpmbuild -bp <src.rpm> and the delta is huge. The important thing is we're not missing a bunch of commits, @bmastbergen caught this before where the kernel.org/master checkout I had was stale so it missed a lot of N+2month commits that it should have caught.

Any rate thanks for the review.

When sending a packet with virtio_net_hdr to tun device, if the gso_type in virtio_net_hdr is SKB_GSO_UDP and the gso_size is less than udphdr size, below crash may happen. ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:4572! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 62 Comm: mytest Not tainted 6.16.0-rc7 #203 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:skb_pull_rcsum+0x8e/0xa0 Code: 00 00 5b c3 cc cc cc cc 8b 93 88 00 00 00 f7 da e8 37 44 38 00 f7 d8 89 83 88 00 00 00 48 8b 83 c8 00 00 00 5b c3 cc cc cc cc <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 000 RSP: 0018:ffffc900001fba38 EFLAGS: 00000297 RAX: 0000000000000004 RBX: ffff8880040c1000 RCX: ffffc900001fb948 RDX: ffff888003e6d700 RSI: 0000000000000008 RDI: ffff88800411a062 RBP: ffff8880040c1000 R08: 0000000000000000 R09: 0000000000000001 R10: ffff888003606c00 R11: 0000000000000001 R12: 0000000000000000 R13: ffff888004060900 R14: ffff888004050000 R15: ffff888004060900 FS: 000000002406d3c0(0000) GS:ffff888084a19000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000040 CR3: 0000000004007000 CR4: 00000000000006f0 Call Trace: <TASK> udp_queue_rcv_one_skb+0x176/0x4b0 net/ipv4/udp.c:2445 udp_queue_rcv_skb+0x155/0x1f0 net/ipv4/udp.c:2475 udp_unicast_rcv_skb+0x71/0x90 net/ipv4/udp.c:2626 __udp4_lib_rcv+0x433/0xb00 net/ipv4/udp.c:2690 ip_protocol_deliver_rcu+0xa6/0x160 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x72/0x90 net/ipv4/ip_input.c:233 ip_sublist_rcv_finish+0x5f/0x70 net/ipv4/ip_input.c:579 ip_sublist_rcv+0x122/0x1b0 net/ipv4/ip_input.c:636 ip_list_rcv+0xf7/0x130 net/ipv4/ip_input.c:670 __netif_receive_skb_list_core+0x21d/0x240 net/core/dev.c:6067 netif_receive_skb_list_internal+0x186/0x2b0 net/core/dev.c:6210 napi_complete_done+0x78/0x180 net/core/dev.c:6580 tun_get_user+0xa63/0x1120 drivers/net/tun.c:1909 tun_chr_write_iter+0x65/0xb0 drivers/net/tun.c:1984 vfs_write+0x300/0x420 fs/read_write.c:593 ksys_write+0x60/0xd0 fs/read_write.c:686 do_syscall_64+0x50/0x1c0 arch/x86/entry/syscall_64.c:63 </TASK> To trigger gso segment in udp_queue_rcv_skb(), we should also set option UDP_ENCAP_ESPINUDP to enable udp_sk(sk)->encap_rcv. When the encap_rcv hook return 1 in udp_queue_rcv_one_skb(), udp_csum_pull_header() will try to pull udphdr, but the skb size has been segmented to gso size, which leads to this crash. Previous commit cf329aa ("udp: cope with UDP GRO packet misdirection") introduces segmentation in UDP receive path only for GRO, which was never intended to be used for UFO, so drop UFO packets in udp_rcv_segment(). Link: https://lore.kernel.org/netdev/[email protected]/ Link: https://lore.kernel.org/netdev/[email protected]/ Fixes: cf329aa ("udp: cope with UDP GRO packet misdirection") Suggested-by: Willem de Bruijn <[email protected]> Signed-off-by: Wang Liang <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

[ Upstream commit d46e51f ] When sending a packet with virtio_net_hdr to tun device, if the gso_type in virtio_net_hdr is SKB_GSO_UDP and the gso_size is less than udphdr size, below crash may happen. ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:4572! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 62 Comm: mytest Not tainted 6.16.0-rc7 #203 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:skb_pull_rcsum+0x8e/0xa0 Code: 00 00 5b c3 cc cc cc cc 8b 93 88 00 00 00 f7 da e8 37 44 38 00 f7 d8 89 83 88 00 00 00 48 8b 83 c8 00 00 00 5b c3 cc cc cc cc <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 000 RSP: 0018:ffffc900001fba38 EFLAGS: 00000297 RAX: 0000000000000004 RBX: ffff8880040c1000 RCX: ffffc900001fb948 RDX: ffff888003e6d700 RSI: 0000000000000008 RDI: ffff88800411a062 RBP: ffff8880040c1000 R08: 0000000000000000 R09: 0000000000000001 R10: ffff888003606c00 R11: 0000000000000001 R12: 0000000000000000 R13: ffff888004060900 R14: ffff888004050000 R15: ffff888004060900 FS: 000000002406d3c0(0000) GS:ffff888084a19000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000040 CR3: 0000000004007000 CR4: 00000000000006f0 Call Trace: <TASK> udp_queue_rcv_one_skb+0x176/0x4b0 net/ipv4/udp.c:2445 udp_queue_rcv_skb+0x155/0x1f0 net/ipv4/udp.c:2475 udp_unicast_rcv_skb+0x71/0x90 net/ipv4/udp.c:2626 __udp4_lib_rcv+0x433/0xb00 net/ipv4/udp.c:2690 ip_protocol_deliver_rcu+0xa6/0x160 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x72/0x90 net/ipv4/ip_input.c:233 ip_sublist_rcv_finish+0x5f/0x70 net/ipv4/ip_input.c:579 ip_sublist_rcv+0x122/0x1b0 net/ipv4/ip_input.c:636 ip_list_rcv+0xf7/0x130 net/ipv4/ip_input.c:670 __netif_receive_skb_list_core+0x21d/0x240 net/core/dev.c:6067 netif_receive_skb_list_internal+0x186/0x2b0 net/core/dev.c:6210 napi_complete_done+0x78/0x180 net/core/dev.c:6580 tun_get_user+0xa63/0x1120 drivers/net/tun.c:1909 tun_chr_write_iter+0x65/0xb0 drivers/net/tun.c:1984 vfs_write+0x300/0x420 fs/read_write.c:593 ksys_write+0x60/0xd0 fs/read_write.c:686 do_syscall_64+0x50/0x1c0 arch/x86/entry/syscall_64.c:63 </TASK> To trigger gso segment in udp_queue_rcv_skb(), we should also set option UDP_ENCAP_ESPINUDP to enable udp_sk(sk)->encap_rcv. When the encap_rcv hook return 1 in udp_queue_rcv_one_skb(), udp_csum_pull_header() will try to pull udphdr, but the skb size has been segmented to gso size, which leads to this crash. Previous commit cf329aa ("udp: cope with UDP GRO packet misdirection") introduces segmentation in UDP receive path only for GRO, which was never intended to be used for UFO, so drop UFO packets in udp_rcv_segment(). Link: https://lore.kernel.org/netdev/[email protected]/ Link: https://lore.kernel.org/netdev/[email protected]/ Fixes: cf329aa ("udp: cope with UDP GRO packet misdirection") Suggested-by: Willem de Bruijn <[email protected]> Signed-off-by: Wang Liang <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

PlaidCat added 30 commits April 8, 2025 17:00

PlaidCat added 17 commits April 8, 2025 17:02

PlaidCat requested review from bmastbergen, kerneltoast, thefossguy-ciq and trinity-q April 8, 2025 22:40

PlaidCat self-assigned this Apr 8, 2025

thefossguy-ciq approved these changes Apr 9, 2025

View reviewed changes

bmastbergen approved these changes Apr 9, 2025

View reviewed changes

PlaidCat merged commit 01aef32 into rocky8_10 Apr 9, 2025
2 checks passed

PlaidCat deleted the rocky8_10_rebuild branch April 9, 2025 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rocky8_10] Hisotry rebuild to kernel-4.18.0-553.47.1.el8_10 #203

[rocky8_10] Hisotry rebuild to kernel-4.18.0-553.47.1.el8_10 #203

Uh oh!

PlaidCat commented Apr 8, 2025

Uh oh!

thefossguy-ciq commented Apr 9, 2025

Uh oh!

thefossguy-ciq left a comment

Uh oh!

bmastbergen left a comment

Uh oh!

PlaidCat commented Apr 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

[rocky8_10] Hisotry rebuild to kernel-4.18.0-553.47.1.el8_10 #203

[rocky8_10] Hisotry rebuild to kernel-4.18.0-553.47.1.el8_10 #203

Uh oh!

Conversation

PlaidCat commented Apr 8, 2025

Checking Rebuild Commits for potentially missing commits:

BUILD

Boot

Kselftest crash check

Uh oh!

thefossguy-ciq commented Apr 9, 2025

Uh oh!

thefossguy-ciq left a comment

Choose a reason for hiding this comment

Uh oh!

bmastbergen left a comment

Choose a reason for hiding this comment

Uh oh!

PlaidCat commented Apr 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants